AITopics | training target

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsNov-21-2025, 04:47:54 GMT

Hybrid Reward Architecture for Reinforcement Learning Harm van Seijen

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

arXiv.org Artificial IntelligenceOct-16-2025

An Operational Deep Learning System for Satellite-Based High-Resolution Global Nowcasting

Agrawal, Shreya, Hassen, Mohammed Alewi, Brempong, Emmanuel Asiedu, Babenko, Boris, Zyda, Fred, Graham, Olivia, Li, Di, Merchant, Samier, Potes, Santiago Hincapie, Russell, Tyler, Cheresnick, Danny, Kakkirala, Aditya Prakash, Rasp, Stephan, Hassidim, Avinatan, Matias, Yossi, Kalchbrenner, Nal, Gupta, Pramod, Hickey, Jason, Bell, Aaron

Precipitation nowcasting, which predicts rainfall up to a few hours ahead, is a critical tool for vulnerable communities in the Global South frequently exposed to intense, rapidly developing storms. Timely forecasts provide a crucial window to protect lives and livelihoods. Traditional numerical weather prediction (NWP) methods suffer from high latency, low spatial and temporal resolution, and significant gaps in accuracy across the world. Recent machine learning-based nowcasting methods, common in the Global North, cannot be extended to the Global South due to extremely sparse radar coverage. We present Global MetNet, an operational global machine learning nowcasting model. It leverages the Global Precipitation Mission's CORRA dataset, geostationary satellite data, and global NWP data to predict precipitation for the next 12 hours. The model operates at a high resolution of approximately 0.05° (~5km) spatially and 15 minutes temporally. Global MetNet significantly outperforms industry-standard hourly forecasts and achieves significantly higher skill, making forecasts useful over a much larger area of the world than previously available. Our model demonstrates better skill in data-sparse regions than even the best high-resolution NWP models achieve in the US. Validated using ground radar and satellite data, it shows significant improvements across key metrics like the critical success index and fractions skill score for all precipitation rates and lead times. Crucially, our model generates forecasts in under a minute, making it readily deployable for real-time applications. It is already deployed for millions of users on Google Search. This work represents a key step in reducing global disparities in forecast quality and integrating sparse, high-resolution satellite observations into weather forecasting.

artificial intelligence, forecast, machine learning, (14 more...)

2510.1305

Country:

North America > United States (1.00)
Asia > Japan (0.05)
Asia > East Asia (0.04)
(16 more...)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsAug-14-2025, 18:06:40 GMT

Bootstrap Y our Object Detector via Mixed Training

MixTraining is found to bring consistent improvements across various detectors on the COCO dataset.

augmentation, pseudo box, training target, (16 more...)

Country: Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Artificial IntelligenceJul-8-2025

Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections

Wang, Bo, Cheng, Qinyuan, Peng, Runyu, Bao, Rong, Li, Peiji, Guo, Qipeng, Li, Linyang, Zeng, Zhiyuan, Zhou, Yunhua, Qiu, Xipeng

Post-training processes are essential phases in grounding pre-trained language models to real-world tasks, with learning from demonstrations or preference signals playing a crucial role in this adaptation. We present a unified theoretical framework bridging Supervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training. Through rigorous mathematical derivation, we demonstrate that both SFT and preference learning methods like Direct Preference Optimization (DPO) operate within the same optimal policy-reward subspace, with SFT representing a special case of implicit reward learning. Our analysis reveals a critical limitation in conventional SFT: the KL divergence term in distribution matching becomes constant with respect to the policy during optimization, failing to constrain model updates. To address this, we propose a simple yet effective learning rate reduction approach that yields significant performance improvements (up to \textbf{25\%} relative gain and \textbf{6\%} absolute win rate increase in instruction following tasks. Additionally, we derive alternative SFT objectives from various f-divergence functions that preserve the KL term during optimization, further enhancing post-DPO model performance. Finally, we extend the theoretical relationship between LLM logits and Q-functions from preference learning to the SFT context, providing mathematical derivations and experimental validation.

large language model, machine learning, natural language, (19 more...)

2507.00018

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
(10 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceJul-4-2025

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

Lu, Ke-Han, Chen, Zhehuai, Fu, Szu-Wei, Yang, Chao-Han Huck, Huang, Sung-Feng, Yang, Chih-Kai, Yu, Chee-En, Chen, Chun-Wei, Chen, Wei-Chih, Huang, Chien-yu, Lin, Yi-Cheng, Lin, Yu-Xiang, Fu, Chi-An, Kuan, Chun-Yi, Ren, Wenze, Chen, Xuanjun, Huang, Wei-Ping, Hu, En-Pei, Lin, Tzu-Quan, Wu, Yuan-Kuei, Huang, Kuan-Po, Huang, Hsiao-Ying, Chou, Huang-Cheng, Chang, Kai-Wei, Chiang, Cheng-Han, Ginsburg, Boris, Wang, Yu-Chiang Frank, Lee, Hung-yi

--We introduce DeST A2.5-Audio, a general-purpose Large Audio Language Model (LALM) designed for robust auditory perception and instruction-following, without requiring task-specific audio instruction-tuning. Recent LALMs typically augment Large Language Models (LLMs) with auditory capabilities by training on large-scale, manually curated or LLM-synthesized audio-instruction datasets. However, these approaches have often suffered from the catastrophic forgetting of the LLM's original language abilities. T o address this, we revisit the data construction pipeline and propose DeST A, a self-generated cross-modal alignment strategy in which the backbone LLM generates its own training targets. This approach preserves the LLM's native language proficiency while establishing effective audio-text alignment, thereby enabling zero-shot generalization without task-specific tuning. Using DeST A, we construct DeST A-AQA5M, a large-scale, task-agnostic dataset containing 5 million training samples derived from 7,000 hours of audio spanning 50 diverse datasets, including speech, environmental sounds, and music. DeST A2.5-Audio achieves state-of-the-art or competitive performance across a wide range of audio-language benchmarks, including Dynamic-SUPERB, MMAU, SAKURA, Speech-IFEval, and V oiceBench. Comprehensive comparative studies demonstrate that our self-generated strategy outperforms widely adopted data construction and training strategies in both auditory perception and instruction-following capabilities. Our findings underscore the importance of carefully designed data construction in LALM development and offer practical insights for building robust, general-purpose LALMs. HE development of general-purpose artificial intelligence has become a central focus in contemporary AI research, driven by the remarkable performance of large language models (LLMs) across various natural language understanding and generation tasks [1]-[7]. Building on these advancements, a promising direction is to equip LLMs with multi-modal understanding capabilities, leading to the emergence of Large Audio Language Models (LALMs) [8]-[22] and Large Vision Language Models (L VLMs) [23]-[27]. This paper focuses on building a general-purpose LALM, illustrated in Figure 1. To develop a general-purpose LALM, two core capabilities are essential: auditory perception and instruction-following. Auditory perception refers to the comprehensive processing of auditory information, including speech, non-verbal cues, background sounds, and music.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2507.02768

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > Canada > Ontario > Toronto (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

arXiv.org Artificial IntelligenceJun-13-2025

AC/DC: LLM-based Audio Comprehension via Dialogue Continuation

Fujita, Yusuke, Mizumoto, Tomoya, Kojima, Atsushi, Liu, Lianbo, Sudo, Yui

We propose an instruction-following audio comprehension model that leverages the dialogue continuation ability of large language models (LLMs). Instead of directly generating target captions in training data, the proposed method trains a model to produce responses as if the input caption triggered a dialogue. This dialogue continuation training mitigates the caption variation problem. Learning to continue a dialogue effectively captures the caption's meaning beyond its surface-level words. As a result, our model enables zero-shot instruction-following capability without multitask instruction tuning, even trained solely on audio captioning datasets. Experiments on AudioCaps, WavCaps, and Clotho datasets with AudioBench audio-scene question-answering tests demonstrate our model's ability to follow various unseen instructions.

caption, large language model, machine learning, (21 more...)

2506.10312

Country: Asia > Japan (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Cantú, Emílio Dolgener, Wittmann, Rolf Klemens, Abdeen, Oliver, Wagner, Patrick, Samek, Wojciech, Baier, Moritz, Lapuschkin, Sebastian

Deep Learning-based Multi Project InP Wafer Simulation for Unsupervised Surface Defect Detection

arXiv.org Artificial IntelligenceJun-13-2025

Quality management in semiconductor manufacturing often relies on template matching with known golden standards. For Indium-Phosphide (InP) multi-project wafer manufacturing, low production scale and high design variability lead to such golden standards being typically unavailable. Defect detection, in turn, is manual and labor-intensive. This work addresses this challenge by proposing a methodology to generate a synthetic golden standard using Deep Neural Networks, trained to simulate photo-realistic InP wafer images from CAD data. We evaluate various training objectives and assess the quality of the simulated images on both synthetic data and InP wafer photographs. Our deep-learning-based method outperforms a baseline decision-tree-based approach, enabling the use of a 'simulated golden die' from CAD plans in any user-defined region of a wafer for more efficient defect detection. We apply our method to a template matching procedure, to demonstrate its practical utility in surface defect detection.

artificial intelligence, machine learning, wafer, (18 more...)

2506.10713

Country:

Europe > Germany (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Ireland (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Semiconductors & Electronics (0.87)
Information Technology > Hardware (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-2-2024, 18:57:27 GMT

1264a061d82a2edae1574b07249800d6-Paper.pdf

One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable. This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns a separate value function for each component reward function. Because each component typically only depends on a subset of all features, the corresponding value function can be approximated more easily by a low-dimensional representation, enabling more effective learning. We demonstrate HRA on a toy-problem and the Atari game Ms. Pac-Man, where HRA achieves above-human performance.

hra, reward function, value function, (15 more...)

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)